Abstract:
Traditional data storage is row oriented and ideal for write sensitive transaction process but they are not suitable for many read sensitive analytical processes. The Data Mining algorithms are analytical in nature and dig the hidden information from the well of structured/unstructured data. They are more analytic, deal with read/search/lookup process for data aggregation, will be potentially enabled by column oriented data storage rather than traditional row oriented storage. In column-oriented database systems (Column store), each database columns are stored separately in contiguous manner, compressed, and densely packed, as opposed to traditional database systems that store entire records (rows) one after the other. In this paper we review the architecture of various open sources column oriented databases like InfiniDB, Monetdb and Infobright. We have compared performance of column store over row stores for the simple tree based classification algorithm and CAIM discretization algorithm. The Novel rule based storage structure for the classification model is proposed, posses simple and efficient way of storage and access. Superior performance of the algorithm with column-stores, have answered the CPU utilization issues for such large-scale data-intensive applications.

Keywords: Data Mining (DM), OLAP, OLTP, Column store, Row store, Classification, ID3, Discretization, CAIM